Stationarity
A stationary time series is one whose properties do not depend on the time at which the series is observed. A time series with trend or seasonality is not stationary but a time series with cyclic variation but no trend and seasonality is stationary because the cycles do not have any fixed frequency. A white noise process is stationary.
In general, a stationary time series has no predictable pattern in long term.
Differencing
Computing the differences between consecutive observations to make the series stationary is called differencing. Differencing can help stabilise the mean of the series by removing changes in the level of a time series, and therefore eliminating trend and seasonality.
ACF plot is one of way of identifying non stationary time series. ACF plot of stationary series will drop to zero more quickly.
library(fpp2)
## Loading required package: ggplot2
## Loading required package: forecast
## Loading required package: fma
## Loading required package: expsmooth
ggAcf(goog200)
ggAcf(diff(goog200))
We can see the ACF of the differencing series above becomes stationary and ACF plot shows no significant correlation.
We can also do 2nd order differencing as well.
The differenced series is the change between consecutive observations in the original series. When the differenced series is a white noise, the model for original series can be written as
\[y_t - y_{t-1} = e_t\] where \(e_t\) denotes white noise. Rearranging this will gives us the random walk model.
=>\[y_t = y_{t-1} + e_t\]
Some properties of random walk process is 1. long periods of apparent trend up or down. 2. sudden and unpredictable changes in direction.
The forecasts from a random walk model are equal to the last observation, as future movements are unpredictable, and are equally likely to be up or down. Thus, the random walk model underpins naïve forecasts
Another related model is \[y_t - y_{t-1} = c + e_t\ or\ y_t = y_{t-1} + c + e_t\]
The value of c is the average of the changes between consecutive observations. If c is positive, then the average change is an increase in the value of \(y_t\). Thus, \(y_t\) will tend to drift upwards. If c is negative, then \(y_t\) will tend to drift downward. This is the model behind the drift method.
A seasonal difference is the difference between an observation and the previous observation from the same season. So, \[y_{t}^{'} = y_t - y_{t-m}\] where m = no. of seasons. If seasonally differenced data appears as white noise, then the model for original series would be \[y_t = y_{t-m} + e_t\]
Example :
cbind("Sales (in millions)" = a10,
"Monthly log sales" = log(a10),
"Annual changes in log scales" = diff(log(a10),12)) %>%
autoplot(facets = TRUE) +
xlab("Year") + ylab("") +
ggtitle("Antidiabetic drug sales")
Above, we are first controlling the variance by taking log and then doing seasonal differencing. The last series is the differenced series that looks like white noise but the mean is not zero. An intercept model will help to make the mean 0.
Sometimes it is necessary to take seasonal difference and first difference both to make a series stationary.
cbind("Billion Kwh" = usmelec,
"logs" = log(usmelec),
"seasonally differenced logs" = diff(log(usmelec), 12),
"Double differenced logs" = diff(diff(log(usmelec),12),1)) %>%
autoplot(facets = TRUE) +
xlab("Year") +
ylab("") +
ggtitle("Monthly US net electricity production")
There is no rule for how much differencing is required. This excercise is subjective to the analysts. But if there is a sesonality present in the data, we should do sesonal differencing first because that alone might make the series stationary.
Unit Root Test One way to determine if the differencing is required is to use unit root test. One such test is Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test. In this test, the null hypothesis is that the data are stationary.
library(urca)
goog %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 7 lags.
##
## Value of test-statistic is: 10.7223
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The value is much bigger than the 1% critical value that means that the null hypothesis is rejected. Thus, the data is not not stationary.
goog %>% diff %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 7 lags.
##
## Value of test-statistic is: 0.0324
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Now the value is within the range so we can conclude that the data is stationary.
One way to do this test is to use ndiff function
ndiffs(goog)
## [1] 1
first order differencing is required.
For Seasonal differencing
nsdiffs(usmelec)
## [1] 1
Exercise
2. IBM Closing Stock Price
autoplot(ibmclose)
ggAcf(ibmclose)
ggPacf(ibmclose)
Autoplot of the series shows that the series is non-stationary as it wanders up and down over time. The ACF plot also decrease slowly over time suggest that the series is non-stationary. PACF plot shows a spike at lag 1 suggesting a first order differencing.
3. Finding Box-Cox and order of differencing Annual US net electricity generation
autoplot(usnetelec)
The data is alsmost linearly increasing. No transformation is required. It looks like only first order differencing is needed.
ndiffs(usnetelec)
## [1] 1
We can use KPSS test to check if differenced is stationary or not.
usnetelec %>% diff() %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.1585
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The test statistic value is within the range. Hence the differenced data is stationary.
Quarterly US GDP
autoplot(usgdp)
The data looks slightly non-linear. We will see if Box Cox tranformation makes it more linear or not.
autoplot(BoxCox(usgdp, lambda = BoxCox.lambda(usgdp))) + ylab("")
The data looks linearly increasing now. It looks like we just need first order differencing.
ndiffs(BoxCox(usgdp, lambda = BoxCox.lambda(usgdp)))
## [1] 1
BoxCox(usgdp, lambda = BoxCox.lambda(usgdp)) %>% diff() %>% autoplot()
The differenced series looks like white noise but the mean is not zero. An intercept will take care of that.
BoxCox(usgdp, lambda = BoxCox.lambda(usgdp)) %>% diff() %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.2013
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The test statistic value is within the range. Hence the differenced data is stationary.
Monthly copper prices
autoplot(mcopper)
The data seems to have increasing variance for bigger price. We will use Box Cox transformation.
mcopper %>% BoxCox(lambda = BoxCox.lambda(mcopper)) %>% autoplot()
mcopper %>% BoxCox(lambda = BoxCox.lambda(mcopper)) %>% diff() %>% autoplot()
The data looks like white noise. We can test it using KPSS test.
mcopper %>% BoxCox(lambda = BoxCox.lambda(mcopper)) %>% diff() %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 6 lags.
##
## Value of test-statistic is: 0.0573
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The test statistic value is within the range. Hence the differenced data is stationary.
Monthly US domestic enplanements
autoplot(enplanements)
The data has an increasing variance. It also has seasonality present. We will first use Box Cox transformation to constrol variance.
BoxCox(enplanements, lambda = BoxCox.lambda(enplanements)) %>% autoplot()
We will first take the seasonal difference and see if it makes series stationary or not.
BoxCox(enplanements, lambda = BoxCox.lambda(enplanements)) %>% diff(12) %>% autoplot()
The series do not look stationary. We will now check for first differencing.
BoxCox(enplanements, lambda = BoxCox.lambda(enplanements)) %>% diff(12) %>% ndiffs()
## [1] 1
1st order differencing is required.
BoxCox(enplanements, lambda = BoxCox.lambda(enplanements)) %>% diff(12) %>%
diff() %>% autoplot()
The series now looks stationary.
BoxCox(enplanements, lambda = BoxCox.lambda(enplanements)) %>% diff(12) %>%
diff() %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0424
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The KPSS test also confirm that data is stationary.
Monthly Australian overseas vistors
autoplot(visitors)
The data is surely has an increasing variance. So we will use Box Cox transformation.
BoxCox(visitors, lambda = BoxCox.lambda(visitors)) %>% autoplot()
We will first take the seasonal difference and see if it makes series stationary or not.
BoxCox(visitors, lambda = BoxCox.lambda(visitors)) %>% diff(12) %>% autoplot()
The series do not look stationary. We will now check for first differencing.
BoxCox(visitors, lambda = BoxCox.lambda(visitors)) %>% diff(12) %>% diff() %>% autoplot()
BoxCox(visitors, lambda = BoxCox.lambda(visitors)) %>% diff(12) %>% diff() %>%
ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.0158
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The series is now stationary.